More aggressively detect and disconnect ill-behaving or broken clients#5017
Open
gefjon wants to merge 3 commits into
Open
More aggressively detect and disconnect ill-behaving or broken clients#5017gefjon wants to merge 3 commits into
gefjon wants to merge 3 commits into
Conversation
BSATN parsing generally accepts the case where there are unused trailing bytes in a buffer after parsing a type. This allows both building up compound-typed objects by repeatedly parsing their members, and packing multiple values into the same buffer sequentially. However, it has an unfortunate consequence when parsing untrusted inputs: if a client submits an input e.g. for a reducer call which is not of the expected type, but has a prefix that parses at the expected type, a direct use of the BSATN parser will accept it and silently ignore the suffix. One simple example is a client attempting to submit an i64 when SpacetimeDB expects an i32, resulting in the high 4 bytes of the client submission being ignored, potentially resulting in a different number being parsed than the one submitted. In this commit, we check when parsing user-submitted function arguments that not only did the parse succeed, but that it also consumed the entire input. If the entire input was not consumed, we treat it as an error in the class described above.
When a client sends an invalid `CallReducer` or `CallProcedure` message, previously, we'd send them an error response but continue their connection. That was silly; there's significant classes of error which mean the connection is broken and should be killed. With this change, a client-supplied invalid `CallReducer` will result in a disconnect. A client-supplied invalid `CallProcedure` will panic due to hitting a `todo!()` which will be filled in in a subsequent commit. As part of this change, I defined `CloseFrame` in spacetimedb-core as a mirror to tunstenite's CloseFrame. This led to a minor audit of our close codes, and revealed one incorrect use: `CloseCode::Error` is defined by the spec as "internal error", but we were sending it in response to a client error. I have replaced it with `CloseCode::Protocol`, which is "protocol error".
This commit adds and implements `ClientConnectionSender::disconnect`, which does what you expect. This required adding a new member to `ClientConnectionSender`, the `disconnect_tx`, along which disconnect messages are sent. Adding the `disconnect_tx` also made it convenient to tidy the `UnorderedWsMessage` channel and rename that concept to `WsControlMessage`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of Changes
Contains two separate but related fixes:
Reject reducer and procedure args which parse but leave behind trailing bytes
Fixes #4945 .
BSATN parsing generally accepts the case where there are unused trailing bytes in a buffer after parsing a type. This allows both building up compound-typed objects by repeatedly parsing their members, and packing multiple values into the same buffer sequentially.
However, it has an unfortunate consequence when parsing untrusted inputs: if a client submits an input e.g. for a reducer call which is not of the expected type, but has a prefix that parses at the expected type, a direct use of the BSATN parser will accept it and silently ignore the suffix. One simple example is a client attempting to submit an i64 when SpacetimeDB expects an i32, resulting in the high 4 bytes of the client submission being ignored, potentially resulting in a different number being parsed than the one submitted.
In this commit, we check when parsing user-submitted function arguments that not only did the parse succeed, but that it also consumed the entire input. If the entire input was not consumed, we treat it as an error in the class described above.
Disconnect clients which issue invalid function calls
Fixes #5002 , #4943 . Prior to this change, clients who sent invalid or ill-typed function calls would receive error responses, but would not be disconnected. Clients which didn't handle those errors would continue sending invalid function calls continuously, consuming server time without performing useful work.
With this change, errors encountered when calling reducers or procedures are divided into three categories:
Invalid, meaning the client is broken and should not attempt to reconnect. Invalid and ill-typed function call arguments, as described above, are in this category.Again, meaning the client should attempt to reconnect. These are things that (may) indicate that a leader failover has occurred and the leader replica is now housed on a different SpacetimeDB node.Notably, this PR does not include any changes to any of the client SDKs. We may recognize the
Againclose code and attempt to reconnect in the future, but do not do so currently.As part of this change, I introduced a new mechanism by which the
corecrate can disconnect aClientConnectionSender, and changed the existing disconnect when the client exceeds its queue limit to use that new mechanism.API and ABI breaking changes
I'm not sure "ABI break" is technically the correct label, but this is from a certain perspective a breaking change to the raw WebSocket API, in the sense that we now reject some requests that previously we would accept, and close some connections that we would previously have left open.
Expected complexity level and risk
For the user-facing changes
2: possible some user somewhere has been relying on this behavior and sending unused bytes in the arguments part of a
CallReducerorCallProceduremessage somehow. Most likely this would be a user of the raw WebSocket format, but it could also be someone with outdated generated bindings in a benign way.For the internal change to
ClientConnectionSenderand the WebSocket loop3: this code is tricky. In particular, there's some concern that this change breaks ping/ponging, disconnecting clients which fill their queues, or some other aspect of the connection lifecycle.
Testing